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Abstract. Suppose that multiple experts (or learning algorithms) provide us with alterna- 
tive Bayesian network (BN) structures over a domain, and that we are interested in combining 
them into a single consensus BN structure. Specifically, we are interested in that the con- 
sensus BN structure only represents independences all the given BN structures agree upon 
and that it has as few parameters associated as possible. In this paper, we prove that there 
may exist several non-equivalent consensus BN structures and that finding one of them is 
NP-hard. Thus, we decide to resort to heuristics to find an approximated consensus BN 



structure. In this paper, we consider the heuristic proposed in (Matzkevich and Abramson 
1992 1993a|b ). This heuristic builds upon two algorithms, called Methods A and B, for effi- 



ciently deriving the minimal directed independence map of a BN structure relative to a given 
node ordering. Methods A and B are claimed to be correct although no proof is provided (a 
proof is just sketched). In this paper, we show that Methods A and B are not correct and 
propose a correction of them. 



I. Introduction 

Bayesian networks (BNs) are a popular graphical formalism for representing probability 
distributions. A BN consists of structure and parameters. The structure, a directed and 
acyclic graph (DAG), induces a set of independencies that the represented probability distri- 
bution satisfies. The parameters specify the conditional probability distribution of each node 
given its parents in the structure. The BN represents the probability distribution that results 
from the product of these conditional probability distributions. Typically, a single expert (or 
learning algorithm) is consulted to construct a BN of the domain at hand. Therefore, there is 
a risk that the so-constructed BN is not as accurate as it could be if, for instance, the expert 
has a bias or overlooks certain details. One way to minimize this risk consists in obtaining 
multiple BNs of the domain from multiple experts and, then, combining them into a single 



consensus BN. This approach has received significant attention in the literature ( 


Matzkevich 


and Abramson 1992 1993a 


b| Maynard-Reid II and Chajewska 2001| Nielsen and Parsons, 


2007 


Pennock and Wellman 


1999| Richardson and Domingos, 2003; del Sagrado and Moral, 


2003 


. The most relevant of these references is probably (Pennock and Wellman 


1999), be- 



cause it shows that even if the experts agree on the BN structure, no method for combining 
the experts' BNs produces a consensus BN that respects some reasonable assumptions and 
whose structure is the agreed BN structure. Unfortunately, this problem is often overlooked. 
To avoid it, we propose to combine the experts' BNs in two steps. First, finding the consensus 
BN structure and, then, finding the consensus parameters for the consensus BN structure. 
This paper focuses only on the first step. Specifically, we assume that multiple experts pro- 
vide us with alternative DAG models of a domain, and we are interested in combining them 
into a single consensus DAG. Specifically, we are interested in that the consensus DAG only 
represents independences all the given DAGs agree upon and as many of them as possible. In 
other words, the consensus DAG is the DAG that represents the most independences among 
all the minimal directed independence (MDI) maps of the intersection of the independence 
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models induced by the given DAGsQ To our knowledge, whether the consensus DAG can or 



cannot be found efficiently is still an open problem. See (Matzkevich and Abramson, 1992 



1993aP for more information. In this paper, we redefine the consensus DAG as the DAG 



that has the fewest parameters associated among all the MDI maps of the intersection of 
the independence models induced by the given DAGs. This definition is in line with that of 
finding a DAG to represent a probability distribution p. The desired DAG is typically defined 
as the MDI map of p that has the fewest parameters associated rather than as the MDI map 
of p that represents the most independences. See, for instance, (Chickering et al. , 2004). The 
number of parameters associated with a DAG is a measure of the complexity of the DAG, 
since it is the number of parameters required to specify all the probability distributions that 
can be represented by the DAG. 

In this paper, we prove that there may exist several non-equivalent consensus DAGs and 
that finding one of them is NP-hard. Thus, we decide to resort to heuristics to find an approx- 



and Abramson (1992 



First 

DAGs, which we denote here as G 1 , . 



imated consensus DAG. In this paper, we consider the following heuristic due to Matzkevich 

1993a[b ). First, let a denote any ordering of the nodes in the given 
,G m . Then, find the MDI map Gi. of each G i rela- 



tive to a. Finally, let the approximated consensus DAG be the DAG whose arcs are exactly 
the union of the arcs in . . . , G™. It should be mentioned that our formulation of the 
heuristic differs from that in (Matzkevich and Abramson 1992, 1993a[b ) in the following two 
points. First, the heuristic was introduced under the original definition of consensus DAG. 
We justify later that the heuristic also makes sense under our definition of consensus DAG. 
Second, a was originally required to be consistent with one of the given DAGs. We remove 
this requirement. All in all, a key step in the heuristic is finding the MDI map G l a of each 
G l . Since this task is not trivial, Matzkevich and Abramson (1993b) present two algorithms, 
called Methods A and B, for efficiently deriving G l a from G 1 . Methods A and B are claimed 
to be correct although no proof is provided (a proof is just sketched). In this paper, we show 
that Methods A and B are not correct and propose a correction of them. 

As said, we are not the first to study the problem of finding the consensus DAG. In addition 

1993a[b ) and |Pennock 



to the works discussed above by Matzkevich and Abramson (1992 



and Wellman (1999), some other works devoted to this problem are (Maynard-Reid II and 



Chajewska 



and Moral 



2001 



Nielsen and Parsons 2007 Richardson and Domingos 2003 del Sagrado 



2003). We elaborate below on the differences between these works and ours. 



Maynard-Reid II and Chajewska (2001) propose to adapt existing score-based algorithms for 



learning DAGs from data to the case where the learning data is replaced by the BNs provided 



by some experts. Their approach suffers the problem pointed out by Pennock and Wellman 



(1999), because it consists essentially in learning a consensus DAG from a combination of the 
given BNs. A somehow related approach is proposed by Richardson and Domingos (2003). 
Specifically, they propose a Bayesian approach to learning DAGs from data, where the prior 
probability distribution over DAGs is constructed from the DAGs provided by some experts. 
Since their approach requires data and does not combine the given DAGs into a single DAG, 
it addresses a problem rather different from the one in this paper. Moreover, the construction 
of the prior probability distribution over DAGs ignores the fact that some given DAGs may 
be different but equivalent. That is, unlike in the present work, a DAG is not interpreted as 
inducing an independence model. A work that is relatively close to ours is that by del Sagrado 



and Moral (2003). Specifically, they show how to construct a MDI map of the intersection and 
union of the independence models induced by the DAGs provided by some experts. However, 



It is worth menti oning that the term consensus DAG has a different meaning in computational biology 
(Jackson et al. 2005). There, the consensus DAG of a given set of DAGs G , 

,G 



, G m is defined as the DAG 
Therefore, the difficulty lies in keeping as many arcs as 



that contains the most of the arcs in G , 
possible without creating c ycles. Note tha t, unlike in the present work, a DAG is not interpreted as inducing 
an independence model in (Jackson et al. 20051. 



3 



there are three main differences between their work and ours. First, unlike us, they do not 
assume that the given DAGs are defined over the same set of nodes. Second, unlike us, they 
assume that there exists a node ordering that is consistent with all the given DAGs. Third, 
their goal is to find a MDI map whereas ours is to find the MDI map that has the fewest 
parameters associated among all the MDI maps, i.e. the consensus DAG. Finally, Nielsen 



and Parsons (2007) develop a general framework to construct the consensus DAG gradually. 



Their framework is general in the sense that it is not tailored to any particular definition of 
consensus DAG. Instead, it relies upon a score to be defined by the user and that each expert 
will use to score different extensions to the current partial consensus DAG. The individual 
scores are then combined to choose the extension to perform. Unfortunately, we do not see 
how this framework could be applied to our definition of consensus DAG. Specifically, we do 
not see how each expert could score the extensions independently of the other experts, what 
the score would look like, or how the scores would be combined. 

It is worth recalling that this paper deals with the combination of probability distributions 
expressed as BNs. Those readers interested in the combination of probability distributions 
expressed in non-graphical numerical forms are referred to, for instance, (Genest and Zidek 



1986). Note also that we are interested in the combination before any data is observed. Those 



readers interested in the combination after some data has been observed and each expert 



has updated her beliefs accordingly are referred to, for instance, (Ng and Abramson, 1994). 



Finally, note also that we aim at combining the given DAGs into a DAG, the consensus DAG. 
Those readers interested in finding not a DAG but graphical features (e.g. arcs or paths) all 



or a significant number of experts agree upon may want to consult (Friedman and Koller 



2003 



Hartemink et al. , 2002 Pena et al. , 2004), since these works deal with a similar problem. 
The rest of the paper is organized as follows. We start by reviewing some preliminary 
concepts in Section [2j We analyze the complexity of finding the consensus DAG in Section 
|3j We discuss the heuristic for finding an approximated consensus DAG in more detail in 
Section |4| We introduce Methods A and B in Section [5] and show that they are not correct. 
We correct them in Section [6j We analyze the complexity of the corrected Methods A and B 
in Section [7] and show that they are more efficient than any other approach we can think of 
to solve the same problem. We close with some discussion in Section |8l 



2. Preliminaries 

In this section, we review some concepts used in this paper. All the DAGs, probability 
distributions and independence models in this paper are defined over V, unless otherwise 
stated. If A — > B is in a DAG C7, then we say that A and B are adjacent in G. Moreover, 
we say that A is a parent of B and B a child of A in G. We denote the parents of B in G 
by Pac(B). A node is called a sink node in G if it has no children in G. A route between 
two nodes A and B in G is a sequence of nodes starting with A and ending with B such 
that every two consecutive nodes in the sequence are adjacent in G. Note that the nodes in 
a route are not necessarily distinct. The length of a route is the number of (not necessarily 
distinct) arcs in the route. We treat all the nodes in G as routes of length zero. A route 
between A and B is called descending from A to B if all the arcs in the route are directed 
towards B. If there is a descending route from A to B, then B is called a descendant of A. 
Note that A is a descendant of itself, since we allow routes of length zero. Given a subset 
X C V, a node A e X is called maximal in G if A is not descendant of any node in X \ {A} 
in G. Given a route p between A and B in G and a route pi between B and C in G, p U p' 
denotes the route between A and C in G resulting from appending p' to p. 

The number of parameters associated with a DAG G is YliB&vW\A&Pa G {B) r A\( r B — 1), where 
rA and r# are the numbers of states of the random variables corresponding to the node A 
and B. An arc A — > B in G is said to be covered if Pac{A) = Pac{B) \ {A}. By covering 
an arc A — > B in G we mean adding to G the smallest set of arcs so that A — )■ B becomes 
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covered. We say that a node C is a collider in a route in a DAG if there exist two nodes A 
and -B such that A — > C <— B is a subroute of the route. Note that A and -B may coincide. 
Let X, Y and Z denote three disjoint subsets of V. A route in a DAG is said to be Z-active 
when (i) every collider node in the route is in Z, and (ii) every non-collider node in the route 
is outside Z. When there is no route in a DAG G between a node in X and a node in Y that 
is Z-active, we say that X is separated from Y given Z in G and denote it as X_L G Y|Z. We 
denote by X/ G Y|Z that X_L G Y|Z does not hold. This definition of separation is equivalent 



to other more common definitions (Studeny 1998, Section 5.1 



Let X, Y, Z and W denote four disjoint subsets of V. Let us abbreviate X U Y as 
XY. An independence model M is a set of statements of the form X _L a/Y|Z, meaning 
that X is independent of Y given Z. Given a subset U C V, we denote by [M]u all the 
statements in M such that X, Y, Z C U. Given two independence models M and N, we 
denote by M C N that if X _L a/Y|Z then X _L atY|Z. We say that M is a graphoid if 
it satisfies the following properties: symmetry X _L mY|Z =>- Y _L jv/X|Z, decomposition 
X _L a/YW|Z X _L a/Y|Z, weak union X _L A /YW|Z X _L A/ Y|ZW, contraction 
X_L A /Y|ZWAX_L M W|Z X_L M YW|Z, an d intersection X _L M Y | Z W A X J_ M W | Z Y => 
XJ_a/YW|Z. The independence model induced by a probability distribution p, denoted as 
I{p)i is the set of probabilistic independences in p. The independence model induced by a 
DAG G, denoted as 1(G), is the set of separation statements X_L G Y|Z. It is known that 
1(G) is a graphoid ( |Studeny and Bouckaert 1998, Lemma 3.1). Moreover, 1(G) satisfies the 



composition property X _L G Y | Z A X _L G W | Z X _L G Y W | Z QChickering and Meek[ |2002 



Proposition 1). Two DAGs G and H are called equivalent if 1(G) = 1(H). 

A DAG G is a directed independence map of an independence model M if 1(G) C M . 
Moreover, G is a minimal directed independence (MDI) map of M if removing any arc from 
G makes it cease to be a directed independence map of M. We say that G and an ordering 
of its nodes are consistent when, for every arc A — > B in G, A precedes B in the node 
ordering. We say that a DAG G a is a MDI map of an independence model M relative to a 
node ordering a if G a is a MDI map of M and G a is consistent with a. If M is a graphoid, 



then G a is unique (Pearl, 1988, Theorems 4 and 9). Specifically, for each node A, Pac a (A) is 



the smallest subset X of the predecessors of A in a, Pre a (A), such that Al. MPre a (A) \X|X. 

3. Finding a Consensus DAG is NP-Hard 

Recall that we have defined the consensus DAG of a given set of DAGs G 1 , . . . , G m as the 
DAG that has the fewest parameters associated among all the MDI maps of n™ 1 /(G'*). A 
sensible way to start the quest for the consensus DAG is by investigating whether there can 
exist several non-equivalent consensus DAGs. The following theorem answers this question. 

Theorem 1. There exists a set of DAGs that has two non- equivalent consensus DAGs. 

Proof. Consider the following two DAGs over four random variables with the same number 
of states each: 

I <- J I J 

; ; 

K -> L K <- L 

Any of the following two non-equivalent DAGs is the consensus DAG of the two DAGs 
above: 

I -> J I <- J 

I \ t t S I 

K <- L K -> L 

□ 
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A natural follow-up question to investigate is whether a consensus DAG can be found effi- 
ciently. Unfortunately, finding a consensus DAG is NP-hard, as we prove below. Specifically, 
we prove that the following decision problem is NP-hard: 

CONSENSUS 

• INSTANCE: A set of DAGs G\...,G m over V, and a positive integer d. 

• QUESTION: Does there exist a DAG G over V such that 1(G) C n™ 1 /(G i ) and the 
number of parameters associated with G is not greater than d ? 

Proving that CONSENSUS is NP-hard implies that finding the consensus DAG is also 
NP-hard, because if there existed an efficient algorithm for finding the consensus DAG, then 
we could use it to solve CONSENSUS efficiently. Our proof makes use of the following two 
decision problems: 

FEEDBACK ARC SET 

• INSTANCE: A directed graph G = (V, A) and a positive integer k. 

• QUESTION: Does there exist a subset B C A such that |B| < k and B has at least 
one arc from every directed cycle in G ? 

LEARN 

• INSTANCE: A probability distribution p over V, and a positive integer d. 

• QUESTION: Does there exist a DAG G over V such that I{G) C I(p) and the number 
of parameters associated with G is not greater than d ? 

FEEDBACK ARC SET is NP-complete (IGarey and Johnsonl [1979]). FEEDBACK ARC 



SET remains NP-complete for directed graphs in which the total degree of each vertex is 



at most three (Gavril, 1977). This degree-bounded FEEDBACK ARC SET problem is used 



in ( Chickering et al. , 2004 ) to prove that LEARN is NP-hard. In their proof, Chickering 



et al. (2004) use the following polynomial reduction of any instance of the degree-bounded 



FEEDBACK ARC SET into an instance of LEARN: 

• Let the instance of the degree-bounded FEEDBACK ARC SET consist of the directed 
graph F = (V F , A F ) and the positive integer k. 

• Let L denote a DAG whose nodes and arcs are determined from F as follows. For 



every arc V/ 



in A F , create the following nodes and arcs in L: 



Aj (9) 

I 

V f(9) ~> B ij (2) 



U (2) 



4 / 

Gij (3) 



D 



ij (9) 
I 



E, 



ij (2) 



<- G 



ij (9) 



7 tj (2) Vj (9) 



The number in parenthesis besides each node is the number of states of the corre- 
sponding random variable. Let H L denote all the nodes in L, and let V L denote 
the rest of the nodes in L. 

Specify a (join) probability distribution p(H L , V L ) such that J(p(H L , V L )) = I(L). 
Let the instance of LEARN consist of the (marginal) probability distribution p(\ L ) 
and the positive integer d, where d is computed from F and k as shown in ( Chickering 



et al. 2004 Equation 2). 



We now describe how the instance of LEARN resulting from the reduction above can be 
further reduced into an instance of CONSENSUS in polynomial time: 

• Let C 1 denote the DAG over V L that has all and only the arcs in L whose both 
endpoints are in V L . 

• Let C 2 denote the DAG over V L that only has the arcs — > C!y for all i and 
3- 
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Let C 3 denote the DAG over V L that only has the arcs Cy —> Fy <— Eij for all i and 



• Let the instance of CONSENSUS consist of the DAGs C l , C 2 and C 3 , and the positive 
integer d. 

Theorem 2. CONSENSUS is NP-hard. 

Proof. We start by proving that there is a polynomial reduction of any instance F of the 
degree-bounded FEEDBACK ARC SET into an instance C of CONSE NSUS. First, reduce 
F into an instance C of LEARN as shown in ( Chickering et al. , 2004 ) and, then, reduce C 
into C as shown above. 

We now prove that there is a solution to F iff there is a solution to C. Theorems 8 and 



9 in (Chickering et al. , 2004) prove that there is a solution to F iff there is a solution to C. 



Therefore, it only remains to prove that there is a solution to £ iff there is a solution to C. 
Let L and p(tt L , V L ) denote the DAG and the probability distribution constructed in the 
reduction of F into C. Recall that I(p(H L , V L )) = I(L). Moreover: 



• Let L 1 denote the DAG over (H L , V 
endpoints are in V L . 

• Let L 2 denote the DAG over (H L , V 
for all i and j. 

• Let L 3 denote the DAG over (H L , V 
for all i and j. 

Note that any separation statement that holds in L also holds in L l 



that has all and only the arcs in L whose both 
that only has the arcs By — > Cy Hi 
that only has the arcs Cy <— Hi 



— > Fy Ey 



L 2 and L 3 . Then, 



I(p(H L ,V L )) = I(L) C nf =1 I(E) and, thus, I(p(V L )) C [n 3 =1 I(F)] V L = n 3 =1 [/(L*)] v ,. 
Let C 1 , C 2 and C 3 denote the DAGs constructed in the reduction of L into C. Note that 
[I{L%ri. = I(C*) for all i. Then, I(p(V L )) C n 3 =1 /(C i ) and, thus, if there is a solution to C 
then there is a solution to C. We now prove the opposite. The proof is essentially the same 
as that of (Chickering et al. , 2004, Theorem 9). Let us define the (Vi,Vj) edge component 
of a DAG G over Y L as the subgraph of G that has all and only the arcs in G whose both 
endpoints are in {Vi, Ay, Bij,Cy, D^, Ey, Fij,Gij,Vj}. Given a solution C to C, we create 
another solution C' to C as follows: 

• Initialize C' to C 1 . 

• For every (Vi, Vj) edge component of C, if there is no directed path in C from Vi to 
Vj, then add to C' the arcs Ey — > Cy Fy. 

• For every (Vi, Vj) edge component of C, if there is a directed path in C from Vi to Vj, 



then add to C' the arcs 5^- — > Fy ^— C, 



Note that C' is acyclic because C is acyclic. Moreover, 1(C) C n 3 =1 /(C l ) because /(C) C 
I(C l ) for all i. In order to be able to conclude that C is a solution to C, it only remains to 
prove that the number of parameters associated with C is not greater than d. Specifically, 
we prove below that C does not have more parameters associated than C, which has less 
than d parameters associated because it is a solution to C. 

As seen before, 1(C) C /(C 1 ). Likewise, 1(C) C /(C 1 ) because C is a solution to C. 
Thus, there exists a sequence S ( resp. of cover ed arc reversals and arc additions that 
transforms C 1 into C (resp. C") (Chickering, 2002, Theorem 4). Note that a covered arc 
reversal does not modify the number of parameters associated with a DAG, whereas an arc 



addition increases it (Chickering, 1995, Theorem 3). Thus, S and S' monotonically increase 
the number of parameters associated with C 1 as they transform it. Recall that C 1 consists 
of a series of edge components of the form 
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Aij (9) 


Dij (g) 


1 


1 


B ij (2) 


#y (2) 


1 


1 


Cy ( 3 ) 


(2) 



^/(9) B H (2) (2) «~ Gij ( 9 ) 



The number in parenthesis besides each node is the number of states of the corresponding 
random variable. Let us study how the sequences S and S' modify each edge component of 
G x . S' simply adds the arcs B^ — > F y - or the arcs — > Cij <— F^. Note that adding 

the first pair of arcs results in an increase of 10 parameters, whereas adding the second pair 
of arcs results in an increase of 12 parameters. Unlike S', S may reverse some arc in the edge 
component. If that is the case, then S must cover the arc first, which implies an increase of 
at least 16 parameters (covering F^ — > Vj by adding E^ — > Vj implies an increase of exactly 
16 parameters, whereas any other arc covering implies a larger increase). Then, S implies 
a larger increase in the number of parameters than 5". On the other hand, if S does not 
reverse any arc in the edge component, then S simply adds the arcs that are in C but not 
in C 1 . Note that either — > F^ or C y - F^ is in C, because otherwise ± c Fij\Z for 
some Z C V L which contradicts the fact that C is a solution to C since JL c 2 -^V/|Z. If 
—> F^ is in C, then either B^ — > F^ or B^ ^— F^ is in C because otherwise B^ _L cF%j I Z 
for some Z C V L such that Cy G Z, which contradicts the fact that C is a solution to C since 
B^ jL C 2Fij\Z. As B^ ^— F^ would create a cycle in C, Bij — > Fij is in C. Therefore, S adds 
the arcs B^ — >■ F^ <— and, by construction of C, S' also adds them. Thus, S implies 
an increase of at least as many parameters as 5". On the other hand, if Cy Fy is in C, 
then either Cy — > E^ or Cy 4— £7y is in C because otherwise Cy JL<7.Ejj|Z for some Z C V L 
such that e Z, which contradicts the fact that C is a solution to C since JLcaEij\Z. 
As Cy would create a cycle in C, C y - -Ey is in C. Therefore, S 1 adds the arcs 

£y — >■ and, by construction of C, S' adds either the arcs E^ — >■ or the 

arcs — > F^ Cy. In any case, S implies an increase of at least as many parameters as 
S' . Consequently, C does not have more parameters associated than C. 



Finally, note that I(p(V )) C 1(C) by (Chickering et al. , 2004, Lemma 7). Thus, if there 



is a solution to C then there is a solution to C 

□ 

It is worth noting that our proof above contains two restrictions. First, the number of 
DAGs to consensuate is three. Second, the number of states of each random variable in V L is 
not arbitrary but prescribed. The first restriction is easy to relax: Our proof can be extended 
to consensuate more than three DAGs by simply letting C l be a DAG over V L with no arcs 
for all i > 3. However, it is an open question whether CONSENSUS remains NP-hard when 
the number of DAGs to consensuate is two and/or the number of states of each random 
variable in V L is arbitrary. 

The following theorem strentghens the previous one. 

Theorem 3. CONSENSUS is NP-complete. 

Proof. By Theorem|2j all that remains to prove is that CONSENSUS is in NP, i.e. that we can 
verify in polynomial time if a given DAG G is a solution to a given instance of CONSENSUS. 

Let a denote any node ordering that is consistent with G. The causal list of G relative 
to a is the set of separation statements A _L G Pr e a (A) \ Pa G (A)\PaG(A) for all node A. It 
is known that 1(G) coincides with the closure with respect to the graphoid properties of the 



causal list of G relative to a (Pearl 1988, Corollary 7). Therefore, 1(G) C n™ 1 /(G' i ) iff 



A _L G iPre Q (A) \ Pa G (A)\Pa G (A) for all 1 < i < m, because n™ 1 J(G i ) is a graphoid (del 



Sagrado and Moral, 2003, Corollary 1). Let n, a and Qi denote, respectively, the number of 



nodes in G, the number of arcs in G, and the number of arcs in Gi. Let b = maxi<,< m a« 
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Checking a separation statement in Gi takes O(eij) time (Geiger et al. , 1990, p. 530). Then, 
checking whether 1(G) C n™ 1 /(G l ) takes 0(mnb) time. Finally, note that computing the 
number of parameters associated with G takes 0(a). 

□ 

4. Finding an Approximated Consensus DAG 

Since finding a consensus DAG of some given DAGs is NP-hard, we decide to resort to 
heuristics to find an approximated consensus DAG. This does not mean that we discard the 
existence of fast super-polynomial algorithms. It simply means that we do not pursue that 
possibility in this paper. Specifically, in this paper we consider the following heuristic due to 
Matzkevich and Abramson (1992, 1993a[b ). First, let a denote any ordering of the nodes in 



the given DAGs, which we denote here as G 1 , . . . , G m . Then, find the MDI map G l a of each 
G % relative to a. Finally, let the approximated consensus DAG be the DAG whose arcs are 
exactly the union of the arcs in G^, . . . , G™. The following theorem justifies taking the union 
of the arcs. Specifically, it proves that the DAG returned by the heuristic is the consensus 
DAG if this was required to be consistent with a. 

Theorem 4. The DAG H returned by the heuristic above is the DAG that has the fewest 
parameters associated among all the MDI maps of fl™ \I(G % ) relative to a. 

Proof. We start by proving that if is a MDI map of n™ ]/(£?*). First, we show that 1(H) C 
n™ 1 /(G 4 ). It suffices to note that 1(H) C n™ 1 /(G^) because each G l a is a subgraph of 
H, and that r^ =1 I(G\) C n™ 1 /(G t ) because I(GjJ C I(G>) for all i. Now, assume to 
the contrary that the DAG H' resulting from removing an arc A — > B from H satis- 
fies that I(H') C n™ 1 /(G J ). By construction of H, A — > B is in G l a for some i, say 
i = j. Note that B 1 w Pre a (B) \ Pa H ,(B)\Pa H ,(B), which implies B 1 & Pre a (B) \ 
((UT =1 Pa G , a (B)) \ {A})\(VJ? =1 Pa &a (B)) \ {A} because Pa w (B) = (Uf =1 Pa G i a (B)) \ {A} and 
I(H') C n™ 1 /(G i ). Note also that B 1 Gi Pre a (B) \ Pa G J a (B)\Pa G J a (B), which implies 
B 1 & Pre a (B) \ Pa GL (B)\Pa GL (B) because" I(G{) C I(G j )" Therefore, B 1 & Pre a (B) \ 
(Pa G j (B) \ {A})\Pa G j (B) \ {A} by intersection. However, this contradicts the fact that G J a 
is the MDI map of G- 7 relative to a. Then, H i s a MDI map of n™ 1 /((j t ) rela tive to a 



Finally, note that n™ 1 /(G*) is a graphoid (del Sagrado and Moral, 2003, Corollary 1). 
Consequently, H is the only MDI map of n^!L 1 /(G ! *) relative to a. 

□ 

A key step in the heuristic above is, of course, choosing a good node ordering a. Unfortu- 
nately, the fact that CONSENSUS is NP-hard implies that it is also NP-hard to find the best 
node ordering a, i.e. the node ordering that makes the heuristic to return the MDI map of 
n™ 1 /(G i ) that has the fewest parameters associated. To see it, note that if there existed an 
efficient algorithm for finding the best node ordering, then Theorem [4] would imply that we 
could solve CONSENSUS efficiently by running the heuristic with the best node ordering. 

In the last sentence, we have implicitly assumed that the heuristic is efficient, which implies 
that we have implicitly assumed that we can efficiently find the MDI map G* of each G l . 
The rest of this paper shows that this assumption is correct. 

5. Methods A and B are not Correct 



Matzkevich and Abramson (1993b) do not only propose the heuristic discussed in the 
previous section, but they also present two algorithms, called Methods A and B, for efficiently 
deriving the MDI map G a of a DAG G relative to a node ordering a. The algorithms work 
iteratively by covering and reversing an arc in G until the resulting DAG is consistent with 
a. It is obvious that such a way of working produces a directed independence map of G. 
However, in order to arrive at G a , the arc to cover and reverse in each iteration must be 
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Construct f3(G, a) 



/* Given a DAG G and a node ordering a, the algorithm returns a node ordering /3 that 
is consistent with G and as close to a as possible */ 

1 = 

2 G' = G 

3 Let A denote a sink node in G' 

/* 3 Let A denote the rightmost node in a that is a sink node in G' */ 

4 Add A as the leftmost node in /3 

5 Let B denote the right neighbor of A in (3 

6 If B ^ and A ^ Pa G (B) and A is to the right of B in a then 

7 Interchange A and B in 

8 Go to line 5 

9 Remove A and all its incoming arcs from G' 

10 If G' / then go to line 3 

11 Return /3 

Method A(G, a) 

/* Given a DAG G and a node ordering a, the algorithm returns G a */ 

1 /3=Construct /3(G, a) 

2 Let Y denote the leftmost node in (3 whose left neighbor in (5 is to its right in a 

3 Let Z denote the left neighbor of Y in f3 

4 If Z is to the right of Y in a then 

5 If Z — > Y is in G then cover and reverse Z — > Y in G 

6 Interchange Y and Z in /3 

7 Go to line 3 

8 If f3 ^ a then go to line 2 

9 Return G 

Method B(G, a) 

/* Given a DAG G and a node ordering a, the algorithm returns G a */ 

1 /3=Construct f3(G, a) 

2 Let Y denote the leftmost node in f3 whose right neighbor in /3 is to its left in a 

3 Let Z denote the right neighbor of Y in j3 

4 If Z is to the left of Y in a then 

5 If Y — > Z is in G then cover and reverse Y — > Z in G 

6 Interchange Y and Z in f3 

7 Go to line 3 

8 If P ^ a then go to line 2 

9 Return G 



Figure 1. Construct /3, and Methods A and B. Our correction of Construct 
(5 consists in replacing line 3 with the line in comments under it. 

carefully chosen. The pseudocode of Methods A and B can be seen in Figure [T] Method A 
starts by calling Construct (3 to derive a node ordering j3 that is consistent with G and as 
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Input DAG G G a Output of 

a=(M,l,K,J,L) Methods A and B 



Figure 2. A counterexample to the correctness of Methods A and B. 

close to a as possible (line 6). By (3 being as close to a as possible, we mean that the number 
of arcs Methods A and B will later cover and reverse is kept at a minimum, because Methods 
A and B will use (3 to choose the arc to cover and reverse in each iteration. In particular, 
Method A finds the leftmost node in /3 that should be interchanged with its left neighbor 
(line 2) and it repeatedly interchanges this node with its left neighbor (lines 3-4 and 6-7). 
Each of these interchanges is preceded by covering and reversing the corresponding arc in G 
(line 5). Method B is essentially identical to Method A. The only differences between them 
are that the word "right" is replaced by the word "left" and vice versa in lines 2-4, and that 
the arcs point in opposite directions in line 5. 



Methods A and B are claimed to be correct in (Matzkevich and Abramson 1993b, Theorem 
4 and Corollary 2) although no proof is provided (a proof is just sketched). The following 
counterexample shows that Methods A and B are actually not correct. Let G be the DAG 
in the left-hand side of Figure [2j Let a = (M,I,K, J,L). Then, we can make use of the 
characterization introduced in Section [2] to see that G a is the DAG in the center of Figure [2} 
However, Methods A and B return the DAG in the right-hand side of Figure [2] To see it, we 
follow the execution of Methods A and B step by step. First, Methods A and B construct (3 
by calling Construct (3, which runs as follows: 

(1) Initially, /3 = and G' = G. 

(2) Select the sink node M in G' . Then, (3 = (M). Remove M and its incoming arcs from 
G'. 

(3) Select the sink node L in G' . Then, (3 = (L,M). No interchange in (3 is performed 
because L e Pac{M). Remove L and its incoming arcs from G' . 

(4) Select the sink node K in G' . Then, (3 = (K, L, M). No interchange in (3 is performed 
because K is to the left of L in a. Remove K and its incoming arcs from G 1 . 

(5) Select the sink node J in G'. Then, /3 = (J, K, L, M). No interchange in (3 is performed 
because J G Paa(K). 

(6) Select the sink node I in G'. Then, (3 = (I,J,K,L,M). No interchange in (3 is 
performed because I is to the left of J in a. 

When Construct (3 ends, Methods A and B continue as follows: 

(7) Initially, = (J, J, K, L, M). 



11 



(8) Add the arc I — > J and reverse the arc J — > K in G. Interchange J and K in 0. 
Then, = (I, K, J, L, M). 

(9) Add the arc J — > M and reverse the arc L — > M in G. Interchange L and M in 0. 
Then, /? = (I,K, J,M,L). 

(10) Add the arcs I — > M and K — >• M, and reverse the arc J — > M in G. Interchange J 
and M in 0. Then, (3 = (J, K, M, J, L). 

(11) Reverse the arc if ->• M in G. Interchange K and M in /3. Then, /3 = (I, M, K, J, L). 

(12) Reverse the arc i — )• M in G. Interchange I and M in /3. Then, $ = (M, J, if, J, L) = 
a. 

As a matter of fact, one can see as early as in step (8) above that Methods A and B will 
fail: One can see that i and M are not separated in the DAG resulting from step (8), which 
implies that I and M will not be separated in the DAG returned by Methods A and B, 
because covering and reversing arcs never introduces new separation statements. However, / 
and M are separated in G a . 

Note that we constructed /3 by selecting first M, then L, then K, then J, and finally I. 
However, we could have selected first K, then /, then M, then L, and finally J, which would 
have resulted in (3 = (J, L, M, I, K). With this (3, Methods A and B return G a . Therefore, it 
makes a difference which sink node is selected in line 3 of Construct (3. However, Construct 
(3 overlooks this detail. We propose correcting Construct (3 by replacing line 3 by "Let A 
denote the rightmost node in a that is a sink node in G' n . Hereinafter, we assume that any 
call to Construct (3 is a call to the corrected version thereof. The rest of this paper is devoted 
to prove that Methods A and B now do return G a . 

6. The Corrected Methods A and B are Correct 

Before proving that Methods A and B are correct, we introduce some auxiliary lemmas. 
Their proof can be found in the appendix. Let us call percolating Y right-to-left in (3 to 
iterating through lines 3-7 in Method A while possible. Let us modify Method A by replacing 
line 2 by "Let Y denote the leftmost node in (3 that has not been considered before" and by 
adding the check Z ^ to line 4. The pseudocode of the resulting algorithm, which we call 
Method A2, can be seen in Figure [3] Method A2 percolates right-to-left in (3 one by one all 
the nodes in the order in which they appear in [3. 

Lemma 1. Method A(G, a) and Method A2(G, a) return the same DAG. 

Lemma 2. Method A2(G, a) and Method B(G, a) return the same DAG. 

Let us call percolating Y left-to-right in f3 to iterating through lines 3-7 in Method B while 
possible. Let us modify Method B by replacing line 2 by "Let Y denote the rightmost node 
in a that has not been considered before" and by adding the check Z ^ to line 4. The 
pseudocode of the resulting algorithm, which we call Method B2, can be seen in Figure |3j 
Method B2 percolates left-to-right in (3 one by one all the nodes in the reverse order in which 
they appear in a. 

Lemma 3. Method B(G, a) and Method B2(G, a) return the same DAG. 

We are now ready to prove the main result of this paper. 

Theorem 5. Let G a denote the MDI map of a DAG G relative to a node ordering a. Then, 
Method A(G, a) and Method B(G, a) return G a . 

Proof. By Lemmas [l]{3j it suffices to prove that Method B2(G, a) returns G a . It is evident 
that Method B2 transforms (3 into a and, thus, that it halts at some point. Therefore, 
Method B2 performs a finite sequence of n modifications (arc additions and covered arc 
reversals) to G. Let G{ denote the DAG resulting from the first i modifications to G, and 
let Go = G. Specifically, Method B2 constructs Gj + i from G» by either (i) reversing the 
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Method A2(G, a) 

/* Given a DAG G and a node ordering a, the algorithm returns G a * / 

1 /3=Construct f3(G, a) 

2 Let Y denote the leftmost node in j3 that has not been considered before 

3 Let Z denote the left neighbor of Y in j3 

4 If Z ^ and Z is to the right of Y in a then 

5 If Z — > Y is in G then cover and reverse Z — > Y in G 

6 Interchange Y and Z in (3 

7 Go to line 3 

8 If f3 / a then go to line 2 

9 Return G 

Method B2(G, a) 

/* Given a DAG G and a node ordering a, the algorithm returns G a */ 

1 /3=Construct /3(G, a) 

2 Let Y denote the rightmost node in a that has not been considered before 

3 Let Z denote the right neighbor of Y in j3 

4 If Z ^ and Z is to the left of Y in a then 

5 If Y — > Z is in G then cover and reverse Y — > Z in G 

6 Interchange Y and Z in /3 

7 Go to line 3 

8 If f3 ^ a then go to line 2 

9 Return G 



Figure 3. Methods A2 and B2. 



covered arc Y — > Z, or (ii) adding the arc X — > Z for some X G Pa G . (Y) \ PacX^)i or (iii) 
adding the arc X ^ Y for some X e Pa Gi (Z) \ Pa Gi (Y). Note that C I{Gi) for all 

< i < n and, thus, that I{G n ) C J(G )- 

We start by proving that Gj is a DAG that is consistent with for all < i < n. Since 
this is true for Go due to line 1, it suffices to prove that if Gj is a DAG that is consistent with 
j3 then so is Gj + i for all < i < n. We consider the following four cases. 

Case 1: Method B2 constructs Gj + i from Gj by reversing the covered arc Y — > Z. Then, 



G; + i is a DAG because reversing a covered arc does not create any cycle (Chickering 



1995, Lemma 1). Moreover, note that Y and Z are interchanged in (3 immediately 

after the covered arc reversal. Thus, Gj+i is consistent with (3. 
Case 2: Method B2 constructs Gj + i from Gj by adding the arc X — Y Z for some 

X e Pa Gi (Y) \ Pa Gi {Z). Note that X is to the left of Y and Y to the left of Z in (3, 

because Gj is consistent with (3. Then, X is to the left of Z in (3 and, thus, G i+ i is a 

DAG that is consistent with /3. 
Case 3: Method B2 constructs Gj + i from Gj by adding the arc X — > Y for some 

X G PaGi(Z) \ Pa Gi {Y). Note that X is to the left of Z in (3 because Gj is consistent 

with (3, and Y is the left neighbor of Z in /3 (recall line 3). Then, X is to the left of 

Y in (3 and, thus, Gj + i is a DAG that is consistent with (3. 
Case 4: Note that /3 may get modified before Method B2 constructs Gj + i from Gj. 

Specifically, this happens when Method B2 executes lines 5-6 but there is no arc 
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Figure 4. Different cases in the proof of Theorem [5j Only the relevant sub- 
graphs of Gi+i and G a are depicted. An undirected edge between two nodes 
denotes that the nodes are adjacent. A curved edge between two nodes denotes 
an S-active route between the two nodes. If the curved edge is directed, then 
the route is descending. A grey node denotes a node that is in S. 

between Y and Z in Gi. However, the fact that Gi is consistent with before Y and 
Z are interchanged in and the fact that Y and Z are neighbors in j3 (recall line 3) 
imply that Gi is consistent with ft after Y and Z have been interchanged. 
Since Method B2 transforms /3 into a, it follows from the result proven above that G n is 
a DAG that is consistent with a. In order to prove the theorem, i.e. that G n = G a , all 
that remains to prove is that I(G a ) C I(G n ). To see it, note that G n = G a follows from 
I {G a ) C I(G n ), I(G n ) C /(Go), the fact that G n is a DAG that is consistent with a, and the 
fact that G a is the unique MDI map of Go relative to a. Recall that G a is guaranteed to be 
unique because /(Go) is a graphoid. 

The rest of the proof is devoted to prove that I(G a ) C I(G n ). Specifically, we prove 
that if I(G a ) C 1(d) then I(G a ) C I(G i+1 ) for all < i < n. Note that this implies 
that I(G a ) C J(G n ) because I(G a ) C J(G ) by definition of MDI map. First, we prove it 
when Method B2 constructs Gj + i from G« by reversing the covered arc Y — > Z. That the 



arc reversed is covered implies that 7(Gj + i) = I(Gi) (Chickering, 1995, Lemma 1). Thus, 
I{G a ) C J(G i+ i) because J(G a ) C J(G?i). 

Now, we prove that if I(G a ) C J(Gi) then J(G a ) C J(G J+ i) for all < i < n when Method 
B2 constructs Gj+i from Gj by adding an arc. Specifically, we prove that if there is an S-active 
route pf^y between two nodes A and B in Gj + i, then there is an S-active route between A 
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and B in G a . We prove this result by induction on the number of occurrences of the added 
arc in pf+ x . We assume without loss of generality that the added arc occurs in pf+ x as few 
or fewer times than in any other S-active route between A and B in Gi + \. We call this the 
minimality property of p^j^] If the number of occurrences of the added arc in pf+ x is zero, 
then pf+ x is an S-active route between A and B in Gi too and, thus, there is an S-active 
route between A and B in G a since I{G a ) C I{Gi). Assume as induction hypothesis that 
the result holds for up to k occurrences of the added arc in pf+ x . We now prove it for k + 1 
occurrences. We consider the following two cases. Each case is illustrated in Figure |4j 

Case 1: Method B2 constructs G^+i from Gi by adding the arc X — Y Z for some X G 
Pa G XY)\Pa Gi (Z). Note that X -> Z occurs in p^{] Let p(* x = p^UX -»■ ZUpf* v 
Note that X £ S and pf+ x is S-active in G i+X because, otherwise, pf+ x would not be 
S-active in Gj+i. Then, there is an S-active route p^ x between A and X in G a by the 
induction hypothesis. Moreover, Y G S because, otherwise, pf x x UX — > Y — > Z U pf+ x 
would be an S-active route between A and B in G- l+ i that would violate the minimality 
property of pf^ v Note that Y Z is in G a because (i) Y and Z are adjacent in G a 
since I(G a ) C I(Gi), and (ii) Z is to the left of Y in a (recall line 4). Note also that 
X — > Y is in G a . To see it, note that X and F are adjacent in G a since I(G a ) C I(Gi). 
Recall that Method B2 percolates left-to-right in /3 one by one all the nodes in the 
reverse order in which they appear in a. Method B2 is currently percolating Y and, 
thus, the nodes to the right of Y in a are to right of Y in (3 too. If X Y were in 
G a then X would be to the right of Y in a and, thus, X would be to the right of Y 
in (3. However, this would contradict the fact that X is to the left of Y in (3, which 
follows from the fact that Gi is consistent with (3. Thus, X — > Y is in G a . We now 
consider two cases. 

Case 1.1: Assume that Z ^ S. Then, pf® x is S-active in Gj+i because, otherwise, 
pf+ x would not be S-active in Gj+i. Then, there is an S-active route pf B between 
Z and B in G a by the induction hypothesis. Then, p^ x U X — >■ K <r- Z U pf s is 
an S-active route between A and .B in G a . 
Case 1.2: Assume that Z G S. Then, pf + B x = Z <- WUpgf Note that W £ S and 
pj^f is S-active in Gj+i because, otherwise, p^ would not be S-active in Gj+i. 
Then, there is an S-active route p^ B between W and B in G a by the induction 
hypothesis. Note that W and Z are adjacent in G a since I(G a ) C I(Gj). This 
and the fact proven above that K ^— Z is in G a imply that K and are adjacent 
in G a because, otherwise, Y JL GiW\TJ but Y _L G a W|LJ for some U C V such 
that Z G U, which would contradict that I(G a ) C I(Gi). In fact, K ^— is in 
Gq,. To see it, recall that the nodes to the right of Y in a are to right of Y in /3 
too. If K — )■ W were in G a then W would be to the right of Y in a and, thus, W 
would be to the right of Y in (3 too. However, this would contradict the fact that 
W is to the left of Y in /3, which follows from the fact that W is to the left of Z 
in because Gi is consistent with 0, and the fact that Y is the left neighbor of 
Z in f3 (recall line 3). Thus, Y <- W is in G a . Then, p^ x U X -> K <- W U p^ B 
is an S-active route between A and B in G a . 
Case 2: Method B2 constructs Gj+i from Gi by adding the arc X — > K for some X G 

Pa Gi (Z)\Pa Gi (F). Note that 1-^7 occurs in p^Q Let p^ = ft^Ul -»■ YUpYft. 

Note that X ^ S and pf+ x is S-active in Gj+i because, otherwise, pf+ x would not be 



2 It is not difficult to show that the number of occurrences of the added arc in pf^ x is then at most two 
(see Case 2.1 for some intuition). However, the proof of the theorem is simpler if we ignore this fact. 
3 Note that maybe A = X and/or B = Z. 

4 Note that maybe W = B. Note also that W j= X because, otherwise, pf^ U X -> Y <- X U p^f would 
be an S-active route between A and B in Gj+i that would violate the minimality property of p^By- 
5 Note that maybe A = X and/or B = Y. 
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S-active in Gi + \. Then, there is an S-active route p^ x between A and X in G a by the 
induction hypothesis. Note that Y Z is in G a because (i) Y and Z are adjacent in 
G a since I(G a ) C I(Gi), and (ii) Z is to the left of Y in a (recall line 4). Note also 
that X and Z are adjacent in G a since I(G a ) C I(Gj). This and the fact that K ^— Z 
is in G a imply that X and K are adjacent in G Q because, otherwise, X JL g 1 Y\U 
but X -L Ga Y\XJ for some U C V such that Z G U, which would contradict that 
I(G a ) Q I{Gi). In fact, X — >■ K is in G Q . To see it, recall that Method B2 percolates 
left-to-right in (3 one by one all the nodes in the reverse order in which they appear in 
a. Method B2 is currently percolating Y and, thus, the nodes to the right of Y in a 
are to right of Y in (3 too. If X <— Y were in G a then X would be to the right of Y in 
a and, thus, X would be to the right of Y in (3 too. However, this would contradict 
the fact that X is to the left of Y in /?, which follows from the fact that X is to the 
left of Z in f3 because Gi is consistent with /3, and the fact that Y is the left neighbor 
of Z in (3 (recall line 3). Thus, X — > Y is in G a . We now consider three cases. 

Case 2.1: Assume that Y G S and pJ B x = Y «- XUp^f . Note that pj^f is S-active 
in because, otherwise, p^ would not be S-active in Gj + i. Then, there is an 
S-active route p XB between X and B in G a by the induction hypothesis. Then, 
p^ x Ul->y<-lU p XB is an S-active route between A and 5 in G a . 
Case 2.2: Assume that Y G S and pj£ = Y <- W U p^f Note that W £ S and 
pj^f is S-active in Gj+i because, otherwise, pf B x would not be S-active in Gj + i. 
Then, there is an S-active route p^ B between W and B in G a by the induction 
hypothesis. Note also that Y W is in G a . To see it, note that K and W are 
adjacent in G a since I(G a ) C I(Gi). Recall that the nodes to the right of F in a 
are to right of F in /3 too. If Y — > W were in G Q then W would be to the right 
of Y in a and, thus, W would be to the right of Y in (3 too. However, this would 
contradict the fact that W is to the left of Y in /?, which follows from the fact that 
Gi is consistent with 0. Thus, Y <- W is in G«. Then, Ul ->■ Y <- WUp^ B 
is an S-active route between A and B in G a . 
Case 2.3: Assume that Y ^ S. The proof of this case is based on that of step 
8 in (Chickering, 2002, Lemma 30). Let D denote the node that is maximal in 
G a from the set of descendants of Y in G^ Note that D is guaranteed to be 
unique by (Chickering, 2002, Lemma 29), because I(G a ) C J(Gj). Note also that 
D ^Y, because Z is a descendant of K in Gi and, as shown above, Y Z is in 
G a . We now show that D is a descendant of Z in Gj. We consider three cases. 
Case 2.3.1: Assume that D = Z. Then, D is a descendant of Z in Gi. 
Case 2.3.2: Assume that D ^ Z and D was a descendant of Z in Go- Recall 
that Method B2 percolates left-to-right in (3 one by one all the nodes in the 
reverse order in which they appear in a. Method B2 is currently percolating 
Y and, thus, it has not yet percolated Z because Z is to the left of Y in 
a (recall line 4). Therefore, none of the descendants of Z in Go (among 
which is D) is to the left of Z in (3. This and the fact that (3 is consistent 
with Gi imply that Z is a node that is maximal in Gj from the set of 



descendants of Z in Gq. Actually, Z is the only such node by (Chickering 



2002, Lemma 29), because I{Gi) Q I (Go). Then, the descendants of Z in 



Go are descendant of Z in Gj too. Thus, D is a descendant of Z in Gj. 
Case 2.3.3: Assume that D ^ Z and D was not a descendant of Z in Go- 
As shown in Case 2.3.2, the descendants of Z in Go are descendant of Z in 
Gi too. Therefore, none of the descendants of Z in G was to the left of D 
in a because, otherwise, some descendant of Z and thus of Y in Gj would 



'Note that maybe W = B. Note also that W ^ X, because the case where W — X is covered by Case 2.1. 
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Method G2H(C7, H) 

/* Given two DAGs G and H such that 1(H) C 1(G), the algorithm transforms 
G into H by a sequence of arc additions and covered arc reversals such that 
after each operation in the sequence G is a DAG and 1(H) C 1(G) * / 

1 Let a denote a node ordering that is consistent with H 

2 G=Method B2(G, a) 

3 Add to G the arcs that are in H but not in G 



Figure 5. Method G2H. 

be to the left of D in a, which would contradict the definition of D. This 
and the fact that D was not a descendant of Z in Gq imply that D was 
still in G' when Z became a sink node of G' in Construct (3 (recall Figure 
[TJ. Therefore, Construct (3 added D to (3 after having added Z (recall lines 
3-4), because D is to the left of Z in a by definition of -DQ For the same 
reason, Construct (3 did not interchange D and Z in (3 afterwards (recall 
line 6). For the same reason, Method B2 has not interchanged D and Z 
in (3 (recall line 4). Thus, D is currently still to the left of Z in (3, which 
implies that D is to the left of Y in (3, because Y is the left neighbor of Z 
in (3 (recall line 3). However, this contradicts the fact that G{ is consistent 
with (3, because D is a descendant of Y in G{. Thus, this case never occurs. 
We continue with the proof of Case 2.3. Note that Y £ S implies that p(3 is 
S-active in Gi+\ because, otherwise, pf E x would not be S-active in Gi + \. Note 
also that no descendant of Z in G{ is in S because, otherwise, there would be an 
S-active route pf Y between X and Y in G{ and, thus, pf+ x U pf Y U pJ B x would 
be an S-active route between A and B in Gj+i that would violate the minimality 
property of pf B x . This implies that D S because, as shown above, D is a 
descendant of Z in Gj. It also implies that there is an S-active descending route 
pZD f rom 2, to D in Gj. Then, pf? x U X — > Z U pf D is an S-active route between 
A and D in Gi + \. Likewise, pW x U Y — > Z U pf D is an S-active route between 
B and D in Gj+i, where denotes the route resulting from reversing p%$_. 
Therefore, there are S-active routes p^ D and p^ D between A and D and between 
B and D in G Q by the induction hypothesis. 

Consider the subroute of p^{ that starts with the arc X — > Y and continues in 
the direction of this arc until it reaches a node E such that E = B or E G S. 
Note that E is a descendant of Y in Gi and, thus, i£ is a descendant of -D in G a 
by definition of D. Let p® E denote the descending route from D to E in G a . 
Assume without loss of generality that G a has no descending route from D to 
B or to a node in S that is shorter than p® E - This implies that if E = B then 
p^ E is S-active in G a because, as shown above, D £ S. Thus, p^ D U p^ E is an 
S-active route between A and 5 in G a . On the other hand, if E G S then E ^ D 
because D ^ S. Thus, p^ D U U pf D U p^ B is an S-active route between A 
and £> in G a , where p ED and p^ B denote the routes resulting from reversing p^ E 
and p BD . 

□ 



7 Note that this statement is true thanks to our correction of Construct j3. 
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Finally, we show how the correctness of Method B2 leads to an alternative proof of the so- 



called Meek's conjecture QMeek[ |1997p . Given two DAGs G and H such that 1(H) C 1(G), 
Meek's conjecture states that we can transform G into H by a sequence of arc additions 
and covered arc reversals such that after each operation in the sequence G is a DAG and 
1(H) C 1(G). The importance of Meek's conjecture lies in that it allows to develop efficient 
and asymptotically correct algorithms for learning BNs from data under mild assumptions 
(IChickeringl [20021 IChickering and Meekl [20021 [Meekl [i~997j iNielsen et all [20031). Meek's 



conjecture was proven to be true in (Chickering 2002 Theorem 4) by developing an algorithm 
that constructs a valid sequence of arc additions and covered arc reversals. We propose an 
alternative algorithm to construct such a sequence. The pseudocode of our algorithm, called 
Method G2H, can be seen in Figure [5] The following corollary proves that Method G2H is 
correct. 

Corollary 1. Given two DAGs G and H such that 1(H) C 1(G), Method G2H(G, H) 
transforms G into H by a sequence of arc additions and covered arc reversals such that after 
each operation in the sequence G is a DAG and 1(H) C 1(G). 

Proof. Note from Method G2H's line 1 that a denotes a node ordering that is consistent with 
H. Let G a denote the MDI map of G relative to a. Recall that G a is guaranteed to be unique 
because 1(G) is a graphoid. Note that 1(H) C 1(G) implies that G a is a subgraph of H. To 
see it, note that 1(H) C 1(G) implies that we can obtain a MDI map of G relative to a by 
just removing arcs from H . However, G a is the only MDI map of G relative to a. 

Then, it follows from the proof of Theorem [5] that Method G2H's line 2 transforms G into 
G a by a sequence of arc additions and covered arc reversals, and that after each operation 
in the sequence G is a DAG and I(G a ) C 1(G). Thus, after each operation in the sequence 
1(H) C 1(G) because 1(H) C I(G a ) since, as shown above, G a is a subgraph of H . Moreover, 
Method G2H's line 3 transforms G from G a to H by a sequence of arc additions. Of course, 
after each arc addition G is a DAG and 1(H) C 1(G) because G a is a subgraph of H. 

□ 



7. The Corrected Methods A and B are Efficient 

In this section, we show that Methods A and B are more efficient than any other solution to 
the same problem we can think of. Let n and a denote, respectively, the number of nodes and 
arcs in G. Moreover, let us assume hereinafter that a DAG is implemented as an adjacency 
matrix, whereas a node ordering is implemented as an array with an entry per node indicating 
the position of the node in the ordering. Since 1(G) is a graphoid, the first solution we can 
think of consists in applying the following characterization of G a : For each node A, Pa Ga (A) 
is the smallest subset X C Pre a (A) such that A _L G Pre a (A) \ X|X. This solution implies 
evaluating for each node A all the 0(2 n ) subsets of Pre a (A). Evaluating a subset implies 
checking a separation statement in G, which takes 0(a) time (Geiger et al. , 1990, p. 530). 
Therefore, the overall runtime of this solution is 0(an2 n ). 

Since 1(G) satisfies the composition property in addition to the graphoid properties, a more 
efficient solution consists in running the incremental association Markov boundary (IAMB) 
algorithm ( Pena et al.[ 2007, Theorem 8) for each node A to find Pa Ga (A). The IAMB 
algorithm first sets Pa Ga (A) = and, then, proceeds with the following two steps. The first 
step consists in iterating through the following line until Pa Ga (A) does not change: Take any 
node B e Pre a (A)\Pa Ga (A) such that AjL G B\Pa Ga (A) and add it to Pa Ga (A). The second 
step consists in iterating through the following line until Pa Ga (A) does not change: Take any 
node B G Pa Ga (A) that has not been considered before and such that A _L G B\Pa Ga (A)\{B} , 
and remove it from Pa Ga (A). The first step of the IAMB algorithm can add 0(n) nodes to 
Pa Ga (A). Each addition implies evaluating 0(n) candidates for the addition, since Pre a (A) 
has 0(n) nodes. Evaluating a candidate implies checking a separation statement in G, which 
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takes 0(a) time (Geiger et al. 1990, p. 530). Then, the first step of the IAMB algorithm 
runs in 0(an 2 ) time. Similarly, the second step of the IAMB algorithm runs in 0(an) time. 
Therefore, the IAMB algorithm runs in 0(an 2 ) time. Since the IAMB algorithm has to be 
run once for each of the n nodes, the overall runtime of this solution is 0(an 3 ). 

We now analyze the efficiency of Methods A and B. To be more exact, we analyze Methods 
A2 and B2 (recall Figure [3]) rather than the original Methods A and B (recall Figure [I]), 
because the former are more efficient than the latter. Methods A2 and B2 run in 0(n 3 ) time. 
First, note that Construct runs in 0(n 3 ) time. The algorithm iterates n times through lines 
3-10 and, in each of these iterations, it iterates O(n) times through lines 5-8. Moreover, line 3 
takes 0(n 2 ) time, line 6 takes 0(1) time, and line 9 takes 0(n) time. Now, note that Methods 
A2 and B2 iterate n times through lines 2-8 and, in each of these iterations, they iterate 0(n) 
times through lines 3-7. Moreover, line 4 takes 0(1) time, and line 5 takes 0(n) time because 
covering an arc implies updating the adjacency matrix accordingly. Consequently, Methods 
A and B are more efficient than any other solution to the same problem we can think of. 

Finally, we analyze the complexity of Method G2H. Method G2H runs in 0(n 3 ) time: 
a can be constructed in 0(n 3 ) time by calling Construct (3(H, 7) where 7 is any node 
ordering, running Method B2 takes 0(n 3 ) time, and adding to G the arcs that are in H 
but not in G can be done in 0(n 2 ) time. Recall that Method G2H is an alternative to the 



algorithm in (Chickering, 2002). Unfortunately, no implementation details are provided in 



(Chickering, 2002) and, thus, a comparison with the runtime of the algorithm there is not 



possible. However, we believe that our algorithm is more efficient. 

8. Discussion 

In this paper, we have studied the problem of combining several given DAGs into a con- 
sensus DAG that only represents independences all the given DAGs agree upon and that 
has as few parameters associated as possible. Although our definition of consensus DAG is 
reasonable, we would like to leave out the number of parameters associated and focus solely 
on the independencies represented by the consensus DAG. In other words, we would like 
to define the consensus DAG as the DAG that only represents independences all the given 
DAGs agree upon and as many of them as possible. We are currently investigating whether 
both definitions are equivalent. In this paper, we have proven that there may exist several 
non-equivalent consensus DAGs. In principle, any of them is equally good. If we were able to 
conclude that one represents more independencies than the rest, then we would prefer that 
one. In this paper, we have proven that finding a consensus DAG is NP-hard. This made 
us resort to heuristics to find an approximated consensus DAG. This does not mean that 
we discard the existence of fast super-polynomial algorithms for the general case, or polyno- 
mial algorithms for constrained cases such as when the given DAGs have bounded in-degree. 
This is a question that we are currently investigating. In this paper, we have considered the 
heuristic originally proposed by Matzkevich and Abramson (1992, 1993ab). This heuristic 



takes as input a node ordering, and we have shown that finding the best node ordering for 
the heuristic is NP-hard. We are currently investigating the application of meta-heuristics in 
the space of node orderings to find a good node ordering for the heuristic. Our preliminary 
experiments indicate that this approach is highly beneficial, and that the best node ordering 
almost never coincides with any of the node orderings that are consistent with some of the 
given DAGs. 
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Appendix: Proofs of Lemmas [DJ3] 
Lemma 1. Method A(G, a) and Method A2(G, a) return the same DAG. 

Proof. It is evident that Methods A and A2 transform /3 into a and, thus, that they halt 
at some point. We now prove that they return the same DAG. We prove this result by 
induction on the number of times that Method A executes line 6 before halting. It is evident 
that the result holds if the number of executions is one, because Methods A and A2 share 
line 1. Assume as induction hypothesis that the result holds for up to k — 1 executions. We 
now prove it for k executions. Let Y and Z denote the nodes involved in the first of the k 
executions. Since the induction hypothesis applies for the remaining k — 1 executions, the 
run of Method A can be summarized as 

If Z — > Y is in G then cover and reverse Z — > Y in G 
Interchange Y and Z in 
For i = 1 to n do 

Percolate right-to-left in (3 the leftmost node in that has not been percolated before 

where n is the number of nodes in G. Now, assume that Y is percolated when i = j. Note 
that the first j — 1 percolations only involve nodes to the left of Y in /3. Thus, the run above 
is equivalent to 

For i = 1 to j — 1 do 

Percolate right-to-left in /3 the leftmost node in /3 that has not been percolated before 
If Z — > Y is in G then cover and reverse Z — > Y in G 
Interchange Y and Z in (5 
Percolate Y right-to-left in (5 
Percolate Z right-to-left in /3 
For i = j + 2 to n do 

Percolate right-to-left in j3 the leftmost node in j3 that has not been percolated before. 

Now, let W denote the nodes to the left of Z in (5 before the first of the k executions of line 
6. Note that the fact that Y and Z are the nodes involved in the first execution implies that 
the nodes in W are also to the left of Z in a. Note also that, when Z is percolated in the 
latter run above, the nodes to the left of Z in /3 are exactly W U {Y}. Since all the nodes 
in W U {Y} are also to the left of Z in a, the percolation of Z in the latter run above does 
not perform any arc covering and reversal or node interchange. Thus, the latter run above is 
equivalent to 

For i — 1 to j — 1 do 

Percolate right-to-left in /3 the leftmost node in /3 that has not been percolated before 
Percolate Z right-to-left in /3 
Percolate Y right-to-left in /3 
For i = j + 2 to n do 

Percolate right-to-left in /3 the leftmost node in /3 that has not been percolated before 

which is exactly the run of Method A2. Consequently, Methods A and A2 return the same 
DAG. 

□ 

Lemma 2. Method A2(G, a) and Method B(G, a) return the same DAG. 

Proof. We can prove the lemma in much the same way as Lemma [TJ We simply need to 
replace Y by Z and vice versa in the proof of Lemma [T] 

□ 
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Lemma 3. Method B(G, a) and Method B2(G, a) return the same DAG. 

Proof. It is evident that Methods B and B2 transform (3 into a and, thus, that they halt 
at some point. We now prove that they return the same DAG. We prove this result by 
induction on the number of times that Method B executes line 6 before halting. It is evident 
that the result holds if the number of executions is one, because Methods B and B2 share 
line 1. Assume as induction hypothesis that the result holds for up to k — 1 executions. We 
now prove it for k executions. Let Y and Z denote the nodes involved in the first of the k 
executions. Since the induction hypothesis applies for the remaining k — 1 executions, the 
run of Method B can be summarized as 

If Y — > Z is in G then cover and reverse Y — > Z in G 
Interchange Y and Z in (3 
For % = 1 to n do 

Percolate left-to-right in f3 the rightmost node in a that has not been percolated before 

where n is the number of nodes in G. Now, assume that Y is the j-th rightmost node in a. 
Note that, for all 1 < % < j, the i-th rightmost node Wi in a is to the right of Y in (3 when 
Wi is percolated in the run above. To see it, assume to the contrary that Wi is to the left of 

Y in p. This implies that Wi is also to the left of Z in /3, because Y and Z are neighbors in 
(3. However, this is a contradiction because Wi would have been selected in line 2 instead of 

Y for the first execution of line 6. Thus, the first j — 1 percolations in the run above only 
involve nodes to the right of Z in (3. Then, the run above is equivalent to 

For % — 1 to j — 1 do 

Percolate left-to-right in (3 the rightmost node in a that has not been percolated before 
If Y — > Z is in G then cover and reverse Y — > Z in G 
Interchange Y and Z in (3 
For i = j to n do 

Percolate left-to-right in (3 the rightmost node in a that has not been percolated before 
which is exactly the run of Method B2. 

□ 
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